Forecasting Disk Resource Requirements for a Usenet Served

نویسنده

  • Karl L Swartz
چکیده

Three years ago the Stanford Linear Accelerator Center (SLAC) decided to embrace netuews as a site-wide, multi-platform communications tool for the laboratory’s diverse user community. The Usenet newsgroups as well as other world-wide newsgroup hierarchies were appealing for their unique ability to tap a broad pool of information, while the availability of the software on a number of platforms provided a way to communicate to and amongst the computing community. The previous way of doing this ran only on the VM mainframe system and had become increasingly ineffective as users migrated to other platforms. 7he increasing &pendence on netnews brought with it the requirement that the service be reliable. This was dramatically demonstrated when the long-neglected netnews service collapsed under the load of the traditional fall surge in Usenet traffic and the site was without news service for a week while an upgraded system was installed. One result of that painful event was that efforts were made to forecast growth and the accompanying hardware requirements so that equipment could be acquired and installed before problems became visible tothellsers. This paper describes the major on-disk databases associated with news software, then presents an analysis of the storage requirements for these databases based on data collected at SLAC. A model is developed from this data which permits forecasting of disk resource requirements for a full feed as a function of time and local policies. Suggestions are also made as to how to modify this model for sites which do not carry a full feed. Why NetNews? Why Usenet? The Stanford Linear Accelerator Center (SLAC) is a medium-sized national research laboratory. The lab’s primary missions are research in elementary partitle physics and development of new techniques in particle accelerators. These are large projects that often involve international collaborations and a diverse user community. This can present a formidable problem for communication with and amongst users, from discussions on the design of new experiments to progress reports on current experiments, as well as the more mundane but equally necessary announcements of network or server outages. There is also a tremendous need to stay in touch with what other researchers are doing. his work supported by the United States Department of Energy under contract number DE-AC03-76SFOO515. In the eighties, the majority of computer users at SLAC logged onto an IBM mainframe running VM. The default user profile on VM brought up a VM Presented at the USEZNIX Association’s Seventh Large Installation System Adminstration Conference (LISA VII), Monterey, California, November l-5,1993. News session upon login, where announcements could be made. Most users would see them since they logged into VM regularly. Discussions were handled by mailing lists maintied by LISTSERV, plus a home-grown VM conferencing system named CONSPIRE. (VM News was also developed at SLAC.) BITNET mailing lists provided the main contact with researchers at other sites. ‘ITme was also a smaller, though still sizable, group of users who used VAX/VMS systems most of the time, if not exclusively. These users tended to be excluded from VM News and from CONSPIRE, though they did make use of BlTNET. DECnet mail with other physics sites was also available. This communications block posed some problems but was tolerated at the time. Over the past few years the computing environment became far more diversified. Unix workstations began to appear, while the PCS. Macintoshes, and Amigas grew powerful enough that their users had dwindling need for VM. The number of users who ilidnotuseVMonaregulafbasis,ifatall,inaeased until the VM-based communication model began to fail completely. A new model was needed that would permit users to read announcements and participate in discussions from whatever platform they were accustomed to using. The multitude of platforms made another homegrown solution undesirable. The software which supports Usenet (referred to hereafter as uetnews software to distinguish it from the network itself), with ready availability of NNTP-based readers for a variety of platforms, seemed to solve the problem except for VM. The discovery of the PennState VM NetNews system completed the solution. B News and NNTP software was built and installed on a Sun tileserver which had some spare disk space, and the PennState NetNews software, with NNTP software from Queen’s University, was installed on VM.I Once the bugs were worked out of this system, NNTP-based readers were acquired for other platforms, often with help from interested users. Meanwhile, a local sZac newsgroup hierarchy was being populated with new groups and users were being introduced to the new system. There was also a great deal of interest in nonlocal groups, of course, i.e., Usenet. After a great deal of debate over proper use of government-funded qquipmenc the appropriateness of censorship in an academic/research community, and the feasibility of detemining just what was and was not appropriate, it wasdecidedtocatryallgroupsandassumeamature and responsible user community. ‘VM NetNews is a full news system, not an NNTP-based reader. ‘here are now several NNTP readers available for VM, including a port of m. These are beiig investi&ed, with tbe intention of consolidating on a single., m-based netnews server. Netaews flomished and mostly solved the communication problem, but created a new problem in that users came to expect it to be reliable. By early 1992 the ever-increasing growth of Usenet traftic had begun to severely strain the resources of the Sun which was hyiug to run netnews while also handling file service and a variety of other tasks. Spool areas would overflow and the now-obsolete B News software would collapse, causing substantial delays and user ire. The expiration noose would be tightened another notch, staving off dialer for a bit longer but also aggravating ah-eady irate users as articles expired before they could be read. Work was begun to determine what equipment was needed to support netnews for the next few years, and to define requirements in the form of minimum expiration times. Eventually a SPARCserver 2 with 4.5 GB of disk space was ordered, arriving mere days too late to avert the collapse of SLAc’s netnews service from the traditional September surge in Usenet traffic. While painful at the time, the degree of pain made it clear just how critical netnews had become to SLAC. Popular reference books on managing Usenet are notably silent on the matter of forecasting resources. [l, 21 Therefore, a study of Usenet growth was begun and has continued, so that future growth needs can be anticipated and handled before another crisis occurs. This paper documents the current state of this ongoing study. Organization of a netnews server’s data A netnews server is composed of a number of databases, several of which have the potential to require a substantial amount of disk space and which fluctuate in size as a function of news activity. The three primary databases ate the articles themselves, the history file, and the active file. Ancillary structures include thread databases, incoming and outgoing spool areas, and log files. While the following description is based on a Unix system running the C News software, -most structures will likely be simii on other systems. The article database is by far the largest portion of a news system. Unix’s hierarchical dimetory structure is a handy analog to the newsgroup hierarchy, so the software uses a simple lexical mapping of newsgroup names into directory names (“.*’ is mapped to 7’) with each article stored in a separate file. For example, article 42 in group news.antwunce.inportant is stored as news/announce/important/42 within the Spool directosy (often /usr / spool /news). Cross-posted articles are handled by links. These are normally hard links, though there is some support for falling back to symbolic links if a hard link fails.2 2The documentation for the February 1993 C News Performance Release describe0 symlink support with uninspiring phrases like “half-hearted code” and “this has not been tested much recently.” [3] There appears to he more confidence in the symlink suppoat in INN 1.4. [4] Storiug each article in a sepamte file implies a tremendous number of relatively small files. If one eschews the ap~tly lightly tested symlink support--an option not likely to be available for large netnews servers for much longer4 use of links for cross-posted articles further implies that this entire tbtabaw must reside within a single disk partition. unfolWnate1y, the combimuion of cross-posts, cancellath% varying expiration times for different newsgroups, and the commou &sire to be able to use grep andotherUnixtoolsonthearticledatabasemakean alternative implementation difficult and thus unlikely inthenearfuture. (lhisisarecurmntthreadinthe ni?ws.sopvurt?. * newsgroups.) The history file, typically located in /usr/lib/news/history, records the article ID of each article seen recently by the news system along with the thne it was received, any explicit expiration time, ana if the at-tide has not yet expired a list enumerating the newsgroups the article appears in and its sequence number in each group. All of this is stoned in a simple, albeit large, file with one article mentioned per line. In order to speed lookups by article ID, an index is maintain typically using dbz. The last of the primary news databases is the active file, typically /usr/lib/news/active, which lists each newsgroup known to the news system ani& for each group, the range of article numbers currently active along with the moderation or other status of the group. Like the history file this is a simple file. It is small, however, and so will not be considered further in thii paper. The largest of the ancillary structures is one or possibly several thread databases, which allow newsreaders to present related articles in a logical order, father than in whatever order they happened to arrive on the netnews server. One example formed the heatt of the tm newsreader. [5] This required about 5% of the space required for the articles, [6] a not insubstantial usage of space which was compounded by the development of several incompatible solutions for other newsmadeas. Fortunately, a standard thread da&base is being adopted in the form of Geoff Collyer’s Overview or NOV (News Overview). Unforumately, the generality required to support the varying needs of different newsreadem, current and future, demands more sp~proximately 10% of the space consumed by articles. [7] Whatever thmading mechanism is used, a separate file is maintained for each newsgroup, with configuration options to choose where these files are stored. One choice is to store them amongst the articles for the newsgroup, e.g., News OverView’s database for the aksewing newsgroup might reside in /usr/spool/news/alt/sewing/.overview. Since the article database is already quite large, and possibly constrained to a single partition, a more prudent choice is to use a separate directory tree, e.g., fnr might store its database for the same newsgroup in /usr/spool/threads/alt/sewing.th. The iumming spool area contains articles, or hopefully batches of articles, that have been received but not yet ~~~wwxI by the news system.. As long as the netnews server processes new articles expeditiously, this requires only modest amounts of space. (In fact, INN handles incoming NNTP traffic entirely within memory, so in this case the space requirement is zero.) Outgoing spool space is even smaller, since it contains no articles, just references to them in the form of article IDs and/or pathnames, and these hopefully don’t stick around very long. For transport mechanisms other than NNTP, e.g., uucp, space will also be required for batches queued for other systems. This paper doesn’t consider thii requirement, as most large sites am using NNTP these days. Fiily, there are the various log files, stored under /usr/lib/news. The main log records each article received by the system along with a timestamp and an mdication of what was done with the article. Normally this log is restarted each night, with a few days worth of old logs kept around. There is also err log, which hopefully is empty or nearly so, and perhaps batchlog, which should also be quite small. Factors influencing the slxe of databases The size of the various databases described in the previous section depend on a number of factors. Some, such as how long articles are kept before being expira are controlled by the news administrator. Many others are dependent entirely on external influences and may vary substantially over time. By far the most ravenous consumer of disk space in netnews is the article database, so it makes sense to scrutinize most carefully those factors which affect thesizeofthisdatabase. Them&ronesamtheaverage size of an article, and the rate at which new articles are received. Filesystem overhead can be signillcant as well, but thii is a fairly simple function of the other two factors an4 of course, the characteristics of the netnews server’s system software. Other factors include the size of article IDS and newsgroup names, cancellation rates, and cross-post rates. l&se play a comparatively minor role in the growth of disk consumption by netnews, as they are only slowly changing, and they have little effect on the article database. A cursory study of these factors-sufficient to establish some baseline data-is all that is attempted here.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NewsCache - A High-Performance Cache Implementation for Usenet News

Usenet News is reaching its limits as current traffic strains the available infrastructure. News data volume increases steadily and competition with other Internet services has intensified. Consequently bandwidth requirements are often beyond that provided by typical links and the processing power needed exceeds a single system’s capabilities. A rapidly growing number of users, especially attra...

متن کامل

Different Methods of Long-Term Electric Load Demand Forecasting a Comprehensive Review

Long-term demand forecasting presents the first step in planning and developing future generation, transmission and distribution facilities. One of the primary tasks of an electric utility accurately predicts load demand requirements at all times, especially for long-term. Based on the outcome of such forecasts, utilities coordinate their resources to meet the forecasted demand using a least-co...

متن کامل

UsenetDHT: A Low-Overhead Design for Usenet

Usenet is a popular distributed messaging and file sharing service: servers in Usenet flood articles over an overlay network to fully replicate articles across all servers. However, replication of Usenet’s full content requires that each server pay the cost of receiving (and storing) over 1 Tbyte/day. This paper presents the design and implementation of UsenetDHT, a Usenet system that allows a ...

متن کامل

Three Approaches to Time Series Forecasting of Petroleum Demand in OECD Countries

Petroleum (crude oil) is one of the most important resources of energy and its demand and consumption is growing while it is a non-renewable energy resource. Hence forecasting of its demand is necessary to plan appropriate strategies for managing future requirements. In this paper, three types of time series methods including univariate Seasonal ARIMA, Winters forecasting and Transfer Function-...

متن کامل

The Cyclic News Filesystem: Getting INN To Do More With Less

When Usenet News servers were first implemented, the design principle of storing each Usenet article in a separate file appeared to be sound. However, the number of Usenet News articles posted per day has grown phenomenally in the past decade and shows no sign of abating. To stay ahead of the growth curve, Usenet administrators have been forced to buy faster machines, more RAM, and many more di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997